SUDS: automatic parallelization for raw processors
نویسنده
چکیده
A computer can never be too fast or too cheap. Computer systems pervade nearly every aspect of science, engineering, communications and commerce because they perform certain tasks at rates unachievable by any other kind of system built by humans. A computer system’s throughput, however, is constrained by that system’s ability to find concurrency. Given a particular target work load the computer architect’s role is to design mechanisms to find and exploit the available concurrency in that work load. This thesis describes SUDS (Software Un-Do System), a compiler and runtime system that can automatically find and exploit the available concurrency of scalar operations in imperative programs with arbitrary unstructured and unpredictable control flow. The core compiler transformation that enables this is scalar queue conversion. Scalar queue conversion makes scalar renaming an explicit operation through a process similar to closure conversion, a technique traditionally used to compile functional languages. The scalar queue conversion compiler transformation is speculative, in the sense that it may introduce dynamic memory allocation operations into code that would not otherwise dynamically allocate memory. Thus, SUDS also includes a transactional runtime system that periodically checkpoints machine state, executes code speculatively, checks if the speculative execution produced results consistent with the original sequential program semantics, and then either commits or rolls back the speculative execution path. In addition to safely running scalar queue converted code, the SUDS runtime system safely permits threads to speculatively run in parallel and concurrently issue memory operations, even when the compiler is unable to prove that the reordered memory operations will always produce correct results. Using this combination of compile time and runtime techniques, SUDS can find concurrency in programs where previous compiler based renaming techniques fail because the programs contain unstructured loops, and where Tomasulo’s algorithm fails because it sequentializes mispredicted branches. Indeed, we describe three application programs, with unstructured control flow, where the prototype SUDS system, running in software on a Raw microprocessor, achieves speedups equivalent to, or better than, an idealized, and unrealizable, model of a hardware implementation of Tomasulo’s algorithm.
منابع مشابه
Scalar Queue Conversion: Dynamic Single Assignment for Concurrent Scheduling
This paper describes scalar queue conversion, a compiler transformation that makes scalar renaming an explicit operation through a process similar to closure conversion. We demonstrate how to use scalar queue conversion to slice a flow graph into two executable parts. When executed, the backward slice creates queues of suspended computations (continuations). At any point in time execution of th...
متن کاملA Software Framework for Supporting General Purpose Applications on Raw Computation Fabrics
This paper presents SUDS (Software Un-Do System), a data speculation system for Raw processors. SUDS manages speculation in software. The key to managing speculation in software is to use the compiler to minimize the number of data items that need to be managed at runtime. Managing speculation in software enables Raw processors to achieve good performance on integer applications without sacrifi...
متن کاملMixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver
In this study, turbulent flow around a tube bundle in non-orthogonal grid is simulated using the Large Eddy Simulation (LES) technique and parallelization of fully coupled Navier – Stokes (NS) equations. To model the small eddies, the Smagorinsky and a mixed model was used. This model represents the effect of dissipation and the grid-scale and subgrid-scale interactions. The fully coupled NS eq...
متن کاملMixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver
In this study, turbulent flow around a tube bundle in non-orthogonal grid is simulated using the Large Eddy Simulation (LES) technique and parallelization of fully coupled Navier – Stokes (NS) equations. To model the small eddies, the Smagorinsky and a mixed model was used. This model represents the effect of dissipation and the grid-scale and subgrid-scale interactions. The fully coupled NS eq...
متن کاملDampvm/dac Programming, Tuning and Automatic Parallelization of Irregular Divide-and-conquer Applications in Programming, Tuning and Automatic Parallelization of Irregular Divide-and-conquer Applications in Dampvm/dac
This paper presents a new object oriented framework DAMPVM/DAC which is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-andconquer (DAC) applications at runtime. The processes are then mapped dynamically to processors taking into account their speeds and even loads by other user processes. The paper presents the programming interface (API) of the framework, ...
متن کامل